3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model
نویسندگان
چکیده
In this paper, we build a multi-style generative model for stylish image captioning which uses multi-modality features, ResNeXt and text features generated by DenseCap. We propose the 3M model, Multi-UPDOWN caption that encodes decodes them into captions. demonstrate effectiveness of our on generating human-like captions examining its performance two datasets, PERSONALITY-CAPTIONS dataset, FlickrStyle10K dataset. compare against variety state-of-the-art baselines various automatic NLP metrics such as BLEU, ROUGE-L, CIDEr, SPICE, etc \footnote{code will be available at https://github.com/cici-ai-club/3M}. A qualitative study has also been done to verify can used different stylized
منابع مشابه
Image Clustering Using Multi-visual Features
This paper presents a research on clustering an image collection using multi-visual features. The proposed method extracted a set of visual features from each image and performed multi-dimensional K-Means clustering on the whole collection. Furthermore, this work experiments on different number of visual features combination for clustering. 2, 3, 5 and 7 pair of visual features chosen from a to...
متن کاملMulti-image Matching Using Segment Features
This paper presents a strategy for matching features in multiple images, which emphasises reliable matching and the recovery of feature extraction errors. The process starts from initial ‘good’ matches, which are validated in multiple images using multi-image constraints. These initial matches are then filtered through a relaxation procedure and are subsequently used to locally predict addition...
متن کاملImage Multi-Classification using PHOW Features
Automatic labeling and classification of a vast number of images is a huge challenge, so machines are used as a part of image classification and annotation is turned into a prerequisite to adapt to the high improvement of advanced digital image innovations consistently. Scale Invariant Feature Transform (SIFT) is an image descriptor for image-based matching and recognition; this descriptor is u...
متن کاملA Mathematical Model for Multi-Region, Multi-Source, Multi-Period Generation Expansion Planning in Renewable Energy for Country-Wide Generation-Transmission Planning
Environmental pollution and rapid depletion are among the chief concerns about fossil fuels such as oil, gas, and coal. Renewable energy sources do not suffer from such limitations and are considered the best choice to replace fossil fuels. The present study develops a mathematical model for optimal allocation of regional renewable energy to meet a country-wide demand and its other essential as...
متن کاملTags Re-ranking Using Multi-level Features in Automatic Image Annotation
Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... International Florida Artificial Intelligence Research Society Conference
سال: 2021
ISSN: ['2334-0762', '2334-0754']
DOI: https://doi.org/10.32473/flairs.v34i1.128380